Change client table from Log to Set#4598
Change client table from Log to Set#4598guoyuhong wants to merge 2 commits intoray-project:masterfrom
Conversation
|
Test PASSed. |
|
Test FAILed. |
|
It looks like when starting 2 node in the test, the driver will connect to a random node now... Some tests will fail in this case... |
8a47230 to
c6fc966
Compare
|
Test PASSed. |
|
Test FAILed. |
|
I think we'd better add a |
python/ray/tests/test_basic.py
Outdated
There was a problem hiding this comment.
This would be simpler.
node = cluster.add_node(num_cpus=3, resources={"CustomResource": 1})
available_plasma_socket_name = node.plasma_store_socket_name
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
c3bca01 to
0a4b9ee
Compare
|
@hartikainen I have refined the temporary code which is used to fix #4140 . Could you take a look and test whether this fix is still working? |
|
Test PASSed. |
|
Test FAILed. |
|
@kfstorm Could you take a look? |
src/ray/gcs/tables.cc
Outdated
There was a problem hiding this comment.
What if multiple nodes call replace at the same time? Will there be race conditions?
There was a problem hiding this comment.
Yes, the monitor and the raylet may both try to change the status. I will change the Log level from FATAL to ERROR.
src/ray/gcs/tables.cc
Outdated
kfstorm
left a comment
There was a problem hiding this comment.
Can you explain why "we'd better keep information the clients that were connected before"? I think the code is still complicated with the is_connected flag.
There was a problem hiding this comment.
Should we consider about the scenario that LRU has removed the whole key before replace happens?
There was a problem hiding this comment.
I have replace the FATAL log to ERROR, so this may not cause crash even if the LRU mechanism has removed this entry.
src/ray/gcs/tables.cc
Outdated
There was a problem hiding this comment.
Does this have something to do with profile table?
There was a problem hiding this comment.
Good catch! This is totally wrong to use CreateProfileTableData. However, because ProfileTableData and SetReplaceEntryData have similar structure, there is no error caught.
python/ray/experimental/state.py
Outdated
0a4b9ee to
0a6467e
Compare
|
Test PASSed. |
|
@kfstorm If we do not keep that information, there is no way to know whether a node was connected or not. It's unlike the object table, we can reconstruct a missing object, but for client, this information is important. |
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
stephanie-wang
left a comment
There was a problem hiding this comment.
Sorry for the late review. This looks good, thanks!
python/ray/tests/conftest.py
Outdated
There was a problem hiding this comment.
I think the previous version of this was okay (cluster.redis_address should not change).
There was a problem hiding this comment.
This flag is awkward. Can we remove it and just have this function always return REDISMODULE_OK? I would prefer if we just had the caller explicitly check the value and call RedisModule_ReplyWithSimpleString(ctx, "OK") if necessary. We can make the pattern a macro if it becomes cumbersome.
There was a problem hiding this comment.
I remove the flag and change the function return value to Status. Then we can call the reply function at the top layer.
There was a problem hiding this comment.
I think it is safer to not allocate a new variable called new_argv and instead pass in argv_vector.data() here. Also, I would probably then rename argv_vector to new_argv or something.
src/ray/gcs/tables.cc
Outdated
There was a problem hiding this comment.
The notification_mode should always be GcsTableNotificationMode::APPEND_OR_ADD, right? Can we make this a check instead of an if?
There was a problem hiding this comment.
We cannot do that, this is a callback for RequestNotifications. There will be messages for both APPEND_OR_ADD and REMOVE.
src/ray/gcs/tables.cc
Outdated
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
src/ray/gcs/tables.cc
Outdated
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
src/ray/gcs/tables.h
Outdated
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
|
LGTM. It will be better if we have a |
|
Test FAILed. |
8efb62a to
65f96a0
Compare
|
@raulchen I have resolved the conflict. It looks like that we need to do some code change to make this work with dynamic resources. It looks like I need to move this code snippet into node manager. And every time the resource updating request is received, the backend needs to look up GCS first and then change the resource correspondingly and then call |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Since Hash Table is preferred and there are racing condition in dynamic resource while using Set, this PR will be closed. The hash table PR is #4911 . |
What do these changes do?
is_insertionis changed tois_connectedwhich is more meaningful.Related issue number
#4154
#4140
Linter
scripts/format.shto lint the changes in this PR.